Implementing Voting Constraints With Finite State Transducers
نویسندگان
چکیده
We describe a constraint-based morphological disambiguation system in which individual constraint rules vote on matching morphological parses followed by its implementation using finite state transducers. Voting constraint rules have a number of desirable properties: The outcome of the disambiguation is independent of the order of application of the local contextual constraint rules. Thus the rule developer is relieved from worrying about conflicting rule sequencing. The approach can also combine statistically and manually obtained constraints, and incorporate negative constraints that rule out certain patterns. The transducer implementation has a number of desirable properties compared to other finite state tagging and light parsing approaches, implemented with automata intersection. The most important of these is that since constraints do not remove parses there is no risk of an overzealous constraint "killing a sentence ~ by removing all parses of a token during intersection. After a description of our approach we present preliminary results from tagging the Wall Street Journal Corpus with this approach. With about 400 statistically derived constraints and about 570 manual constraints, we can attain an accuracy of 97.82% on the training corpus and 97.29% on the test corpus. We then describe a finite state implementation of our approach and discuss various related issues.
منابع مشابه
Morphological Disambiguation by Voting Constraints
We present a constraint-based morphological disambiguation system in which individual constraints vote on matching morphological parses, and disambiguation of all the tokens in a sentence is performed at the end by selecting parses that receive the highest votes. This constraint application paradigm makes the outcome of the disambiguation independent of the rule sequence, and hence relieves the...
متن کاملLecture 8: Speech Recognition Using Finite State Transducers
In order to use HTK-trained speech recognition models with the AT&T speech recognition search engine, three types of conversion are necessary. First, you must convert the HTK-format hidden Markov models into ATT format acoustic models. Second, you’ll need to write finite state transducers for the language model, dictionary, and context dependency transducer. Third, acoustic feature files need t...
متن کاملFSA Utilities: A Toolbox to Manipulate Finite-State Automata
This paper describes the FSA Utilities toolbox: a collection of utilities to manipulate nite-state automata and nite-state transducers. Manipulations include determinization (both for nite-state accep-tors and nite-state transducers), minimization, composition, comple-mentation, intersection, Kleene closure, etc. Furthermore, various visu-alization tools are available to browse nite-state autom...
متن کاملSymbolic Tree Transducers
Symbolic transducers are useful in the context of web security as they form the foundation for sanitization of potentially malicious data. We define Symbolic Tree Transducers as a generalization of Regular Transducers as finite state input-output tree automata with logical constraints over a parametric background theory. We examine key closure properties of Symbolic Tree Transducers and we deve...
متن کاملConstraining Separated Morphotactic Dependencies In Finite-State Grammars
[Morphology, Morphotactics, Finite State, Separated Dependencies] This paper examines dependencies between separated (non-adjacent) morphemes in naturallanguage words and a variety of ways to constrain them in finite-state morphology. Methods include running separate constraining transducers at runtime, composing in constraints at compile time, feature unification, and the use of FLAG DIACRITIC...
متن کامل